62 research outputs found
The Structure Transfer Machine Theory and Applications
Representation learning is a fundamental but challenging problem, especially
when the distribution of data is unknown. We propose a new representation
learning method, termed Structure Transfer Machine (STM), which enables feature
learning process to converge at the representation expectation in a
probabilistic way. We theoretically show that such an expected value of the
representation (mean) is achievable if the manifold structure can be
transferred from the data space to the feature space. The resulting structure
regularization term, named manifold loss, is incorporated into the loss
function of the typical deep learning pipeline. The STM architecture is
constructed to enforce the learned deep representation to satisfy the intrinsic
manifold structure from the data, which results in robust features that suit
various application scenarios, such as digit recognition, image classification
and object tracking. Compared to state-of-the-art CNN architectures, we achieve
the better results on several commonly used benchmarks\footnote{The source code
is available. https://github.com/stmstmstm/stm }
Local Feature Discriminant Projection
In this paper, we propose a novel subspace learning algorithm called Local Feature Discriminant Projection (LFDP) for supervised dimensionality reduction of local features. LFDP is able to efficiently seek a subspace to improve the discriminability of local features for classification. We make three novel contributions. First, the proposed LFDP is a general supervised subspace learning algorithm which provides an efficient way for dimensionality reduction of large-scale local feature descriptors. Second, we introduce the Differential Scatter Discriminant Criterion (DSDC) to the subspace learning of local feature descriptors which avoids the matrix singularity problem. Third, we propose a generalized orthogonalization method to impose on projections, leading to a more compact and less redundant subspace. Extensive experimental validation on three benchmark datasets including UIUC-Sports, Scene-15 and MIT Indoor demonstrates that the proposed LFDP outperforms other dimensionality reduction methods and achieves state-of-the-art performance for image classification
Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports
The way we analyse clinical texts has undergone major changes over the last
years. The introduction of language models such as BERT led to adaptations for
the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on
large databases of archived medical documents. While performing well in terms
of accuracy, both the lack of interpretability and limitations to transfer
across languages limit their use in clinical setting. We introduce a novel
light-weight graph-based embedding method specifically catering radiology
reports. It takes into account the structure and composition of the report,
while also connecting medical terms in the report through the multi-lingual
SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers
the underlying relationships among clinical terms, achieving a representation
that is better understandable for clinicians and clinically more accurate,
without reliance on large pre-training datasets. We show the use of this
embedding on two tasks namely disease classification of X-ray reports and image
classification. For disease classification our model is competitive with its
BERT-based counterparts, while being magnitudes smaller in size and training
data requirements. For image classification, we show the effectiveness of the
graph embedding leveraging cross-modal knowledge transfer and show how this
method is usable across different languages
Episodic Multi-Task Learning with Heterogeneous Neural Processes
This paper focuses on the data-insufficiency problem in multi-task learning
within an episodic training setup. Specifically, we explore the potential of
heterogeneous information across tasks and meta-knowledge among episodes to
effectively tackle each task with limited data. Existing meta-learning methods
often fail to take advantage of crucial heterogeneous information in a single
episode, while multi-task learning models neglect reusing experience from
earlier episodes. To address the problem of insufficient data, we develop
Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within
the framework of hierarchical Bayes, HNPs effectively capitalize on prior
experiences as meta-knowledge and capture task-relatedness among heterogeneous
tasks, mitigating data-insufficiency. Meanwhile, transformer-structured
inference modules are designed to enable efficient inferences toward
meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful
functional priors for adapting to novel heterogeneous tasks in each meta-test
episode. Experimental results show the superior performance of the proposed
HNPs over typical baselines, and ablation studies verify the effectiveness of
the designed inference modules.Comment: 28 pages, spotlight of NeurIPS 202
Order-preserving Consistency Regularization for Domain Adaptation and Generalization
Deep learning models fail on cross-domain challenges if the model is
oversensitive to domain-specific attributes, e.g., lightning, background,
camera angle, etc. To alleviate this problem, data augmentation coupled with
consistency regularization are commonly adopted to make the model less
sensitive to domain-specific attributes. Consistency regularization enforces
the model to output the same representation or prediction for two views of one
image. These constraints, however, are either too strict or not
order-preserving for the classification probabilities. In this work, we propose
the Order-preserving Consistency Regularization (OCR) for cross-domain tasks.
The order-preserving property for the prediction makes the model robust to
task-irrelevant transformations. As a result, the model becomes less sensitive
to the domain-specific attributes. The comprehensive experiments show that our
method achieves clear advantages on five different cross-domain tasks.Comment: Accepted by ICCV 202
Supervised local descriptor learning for human action recognition
Local features have been widely used in computer vision tasks, e.g., human action recognition, but it tends to be an extremely challenging task to deal with large-scale local features of high dimensionality with redundant information. In this paper, we propose a novel fully supervised local descriptor learning algorithm called discriminative embedding method based on the image-to-class distance (I2CDDE) to learn compact but highly discriminative local feature descriptors for more accurate and efficient action recognition. By leveraging the advantages of the I2C distance, the proposed I2CDDE incorporates class labels to enable fully supervised learning of local feature descriptors, which achieves highly discriminative but compact local descriptors. The objective of our I2CDDE is to minimize the I2C distances from samples to their corresponding classes while maximizing the I2C distances to the other classes in the low-dimensional space. To further improve the performance, we propose incorporating a manifold regularization based on the graph Laplacian into the objective function, which can enhance the smoothness of the embedding by extracting the local intrinsic geometrical structure. The proposed I2CDDE for the first time achieves fully supervised learning of local feature descriptors. It significantly improves the performance of I2C-based methods by increasing the discriminative ability of local features while greatly reducing the computational burden by dimensionality reduction to handle large-scale data. We apply the proposed I2CDDE algorithm to human action recognition on four widely used benchmark datasets. The results have shown that I2CDDE can significantly improve I2C-based classifiers and achieves state-of-the-art performance
- …